libnvme: internalize uring dispatch and fix NULL hdl dereference#3343
Open
martin-belanger wants to merge 3 commits intolinux-nvme:masterfrom
Open
libnvme: internalize uring dispatch and fix NULL hdl dereference#3343martin-belanger wants to merge 3 commits intolinux-nvme:masterfrom
martin-belanger wants to merge 3 commits intolinux-nvme:masterfrom
Conversation
Author
|
This PR replaces: #3341 |
igaw
reviewed
May 8, 2026
Collaborator
|
And I forgot to mention we also would need this for |
added 3 commits
May 8, 2026 14:59
Move the io_uring vs. ioctl routing decision out of callers and into libnvme_submit_admin_passthru() itself. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Every caller of libnvme_submit_admin_passthru() must follow up with libnvme_wait_complete_passthru() to drain io_uring completions before inspecting response data. Functions with two sequential submits where the second depends on the first's response (libnvme_set_etdas, libnvme_clear_etdas, libnvme_get_uuid_list) get an intermediate wait between the two submits. libnvme_get_log() already had a conditional wait guarded by hdl->uring_enabled; make it unconditional now that libnvme_wait_complete_passthru() is a true no-op in the no-uring build. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
libnvme_ctrl_get_transport_handle() can return NULL when libnvme_open() fails during rapid device teardown. Add a NULL check at the top of both libnvme_submit_admin_passthru() and libnvme_submit_io_passthru() so the condition is caught centrally rather than causing a NULL dereference in callers. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
2fee4a7 to
16dc0db
Compare
Author
|
@igaw, I was thinking of introducing a helper function, err = libnvme_submit_admin_passthru(hdl, &cmd);
if (err)
return err;
err = libnvme_wait_complete_passthru(hdl);
if (err)
return err;This could be simplified as: err = libnvme_admin_passthru(hdl, &cmd);
if (err)
return err;Here's what the helper would look like: static inline int libnvme_admin_passthru(struct libnvme_transport_handle *hdl,
struct libnvme_passthru_cmd *cmd)
{
int err = libnvme_submit_admin_passthru(hdl, cmd);
return err ? err : libnvme_wait_complete_passthru(hdl);
}I also noticed that What do you think? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
libnvme_ctrl_get_transport_handle()uses lazy-open semantics: it opens/dev/nvmeXon first use. When the device is mid-teardown (kernel returnsEAGAIN),libnvme_open()fails and the handle stays NULL. Three callers passed that NULL directly intolibnvme_submit_admin_passthru(), which dereferenced it immediately at thehdl->uring_enabledcheck — causing a segfault reproducible with nvme-stas under rapid device create/delete cycling.Fix
Per your suggestion, the NULL guard is now centralized in
ioctl.crather than scattered across call sites.While moving it there, the io_uring vs. ioctl routing was also pulled out of callers and into
libnvme_submit_admin_passthru()itself.libnvme_wait_complete_passthru()is now a true no-op (return 0) in the no-uring build, so all callers can pair submit + wait unconditionally. The batching behaviour inlibnvme_get_log()is preserved.Question
@igaw - Should we also add
if (!hdl) return -ENODEV;with aLIBNVME_LOG_DEBUGmessage at the three call sites that obtainhdlfromlibnvme_ctrl_get_transport_handle()(nvme_discovery_log,nvmf_dim,libnvme_ctrl_identify)? This would provide a traceable log entry identifying the exact code path that hit the NULL, which is useful for diagnosing the race. Happy to add that as a follow-up commit if you think it's worthwhile.Testing
Verified with nvme-stas tests that rapidly create and delete NVMe devices — no crash after this patch.